我们介绍了自回归文本到图像(Parti)模型的途径,该模型生成高保真的影像图像并支持涉及复杂组成和世界知识的内容丰富的合成。 Parti将文本对图像生成视为类似于机器翻译的序列到序列建模问题,图像令牌的序列是目标输出,而不是其他语言的文本令牌。这种策略自然可以利用大型语言模型的先前工作,通过扩展数据和模型尺寸,能力和性能的持续进展。我们的方法很简单:首先,Parti使用基于变压器的图像令牌VIT-VQGAN将图像编码为离散令牌的序列。其次,我们通过将编码器二次变压器模型缩放到20B参数来实现一致的质量改进,其新的最新零弹药FID得分为7.23,而MS-Coco的FIDED得分为3.22。我们对本地化叙述以及党的详细分析(P2),这是1600多个英语提示的新的整体基准,证明了Parti在各种类别和难度方面的有效性。我们还探索并突出了我们的模型的局限性,以定义和体现关注重点领域以进一步改进。有关高分辨率图像,请参见https://parti.research.google/。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, graph neural networks have been gaining a lot of attention to simulate dynamical systems due to their inductive nature leading to zero-shot generalizability. Similarly, physics-informed inductive biases in deep-learning frameworks have been shown to give superior performance in learning the dynamics of physical systems. There is a growing volume of literature that attempts to combine these two approaches. Here, we evaluate the performance of thirteen different graph neural networks, namely, Hamiltonian and Lagrangian graph neural networks, graph neural ODE, and their variants with explicit constraints and different architectures. We briefly explain the theoretical formulation highlighting the similarities and differences in the inductive biases and graph architecture of these systems. We evaluate these models on spring, pendulum, gravitational, and 3D deformable solid systems to compare the performance in terms of rollout error, conserved quantities such as energy and momentum, and generalizability to unseen system sizes. Our study demonstrates that GNNs with additional inductive biases, such as explicit constraints and decoupling of kinetic and potential energies, exhibit significantly enhanced performance. Further, all the physics-informed GNNs exhibit zero-shot generalizability to system sizes an order of magnitude larger than the training system, thus providing a promising route to simulate large-scale realistic systems.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
With recent developments in Social Computing, Natural Language Processing and Clinical Psychology, the social NLP research community addresses the challenge of automation in mental illness on social media. A recent extension to the problem of multi-class classification of mental health issues is to identify the cause behind the user's intention. However, multi-class causal categorization for mental health issues on social media has a major challenge of wrong prediction due to the overlapping problem of causal explanations. There are two possible mitigation techniques to solve this problem: (i) Inconsistency among causal explanations/ inappropriate human-annotated inferences in the dataset, (ii) in-depth analysis of arguments and stances in self-reported text using discourse analysis. In this research work, we hypothesise that if there exists the inconsistency among F1 scores of different classes, there must be inconsistency among corresponding causal explanations as well. In this task, we fine tune the classifiers and find explanations for multi-class causal categorization of mental illness on social media with LIME and Integrated Gradient (IG) methods. We test our methods with CAMS dataset and validate with annotated interpretations. A key contribution of this research work is to find the reason behind inconsistency in accuracy of multi-class causal categorization. The effectiveness of our methods is evident with the results obtained having category-wise average scores of $81.29 \%$ and $0.906$ using cosine similarity and word mover's distance, respectively.
translated by 谷歌翻译
我们提出了一个新型的基于流动合成的视觉致毒框架,从而为微型航空车辆(MAV)避免了远距离的障碍物(MAV)在高大的摩天大楼中飞行。最近的基于深度学习的框架使用光流进行高精度的视觉伺服。在本文中,我们探讨了一个问题:我们可以为这些高精度视觉服务方法设计替代流,从而导致避免障碍?我们重新审视显着性的概念,以识别其他竞争摩天大楼和建筑物之间的攻击线中的高层建筑物作为碰撞障碍。合成的流程用于取代显着对象分割掩码。该流程得以计算,以至于视觉伺服控制器在障碍物周围安全地操纵MAV。在这种方法中,我们使用基于多步跨凝结法(CEM)的伺服控制来实现流量收敛,从而导致避免障碍物。我们使用这种新颖的管道来成功,持久地进行高层建筑,并在模拟和现实的现实世界中实现目标。我们进行了广泛的实验,并将我们的方法与光流和基于短距离的障碍物回避方法进行比较,以证明所提出的框架的优点。可以在https://sites.google.com/view/munocular-obstacle/home上找到其他可视化。
translated by 谷歌翻译
多模型对现实世界应用的承诺激发了可视化和理解其内部力学的研究,其最终目标是使利益相关者能够可视化模型行为,执行模型调试并促进对机器学习模型的信任。但是,现代的多模型模型通常是黑盒神经网络,这使得了解其内部力学变得具有挑战性。我们如何能在这些模型中可视化多模式相互作用的内部建模?我们的论文旨在通过提出Multiviz来填补这一空白,这是一种通过将可解释性问题分为4个阶段来分析多模型模型行为的方法:(1)单峰的重要性:每种模式如何有助于下游建模和预测,(2)交叉交叉。 - 模式相互作用:不同模态如何相互关系,(3)多模式表示:如何在决策级特征中表示单峰和跨模式的交互作用,以及(4)多模式预测:决策级特征如何组成以制造一个预言。 Multiviz旨在在不同的模式,模型,任务和研究领域进行操作。通过对6个现实世界任务的8个训练模型的实验,我们表明,Multiviz中的互补阶段共同使用户能够(1)模拟模型预测,(2)将可解释的概念分配给功能,(3)对模型错误分析执行错误分析,(4)使用错误分析到调试模型的见解。 Multiviz公开可用,将定期使用新的解释工具和指标进行更新,并欢迎社区的意见。
translated by 谷歌翻译
我们提出了一种可扩展的方法,用于学习开放世界对象目标导航(ObjectNAV) - 要求虚拟机器人(代理)在未探索的环境中找到对象的任何实例(例如,“查找接收器”)。我们的方法完全是零拍的 - 即,它不需要任何形式的objectNav奖励或演示。取而代之的是,我们训练图像目标导航(ImagenAv)任务,在该任务中,代理在其中找到了捕获图片(即目标图像)的位置。具体而言,我们将目标图像编码为多模式的语义嵌入空间,以在未注释的3D环境(例如HM3D)中以大规模训练语义目标导航(Senanticnav)代理。训练后,可以指示Semanticnav代理查找以自由形式的自然语言描述的对象(例如,“接收器”,“浴室水槽”等),通过将语言目标投射到相同的多模式,语义嵌入空间中。结果,我们的方法启用了开放世界的ObjectNAV。我们在三个ObjectNAV数据集(Gibson,HM3D和MP3D)上广泛评估了我们的代理商,并观察到成功的4.2%-20.0%的绝对改进。作为参考,这些收益与2020年至2021年Objectnav挑战赛竞争对手之间成功的5%改善相似或更好。在开放世界的环境中,我们发现我们的代理商可以概括为明确提到的房间(例如,“找到厨房水槽”)的复合说明,并且何时可以推断目标室(例如,”找到水槽和炉子”)。
translated by 谷歌翻译
With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small in size. In addition, labeling standards are not uniform across datasets, and there is no clear information on the acquisition device. Here we release a multi-annotation, multi-quality, and multi-device color fundus image dataset for glaucoma analysis on an original challenge -- Retinal Fundus Glaucoma Challenge 2nd Edition (REFUGE2). The REFUGE2 dataset contains 2000 color fundus images with annotations of glaucoma classification, optic disc/cup segmentation, as well as fovea localization. Meanwhile, the REFUGE2 challenge sets three sub-tasks of automatic glaucoma diagnosis and fundus structure analysis and provides an online evaluation framework. Based on the characteristics of multi-device and multi-quality data, some methods with strong generalizations are provided in the challenge to make the predictions more robust. This shows that REFUGE2 brings attention to the characteristics of real-world multi-domain data, bridging the gap between scientific research and clinical application.
translated by 谷歌翻译
深度学习(DL)模型为各种医学成像基准挑战提供了最先进的性能,包括脑肿瘤细分(BRATS)挑战。然而,局灶性病理多隔室分割(例如,肿瘤和病变子区)的任务特别具有挑战性,并且潜在的错误阻碍DL模型转化为临床工作流程。量化不确定形式的DL模型预测的可靠性,可以实现最不确定的地区的临床审查,从而建立信任并铺平临床翻译。最近,已经引入了许多不确定性估计方法,用于DL医学图像分割任务。开发指标评估和比较不确定性措施的表现将有助于最终用户制定更明智的决策。在本研究中,我们探索并评估在Brats 2019-2020任务期间开发的公制,以对不确定量化量化(Qu-Brats),并旨在评估和排列脑肿瘤多隔室分割的不确定性估计。该公制(1)奖励不确定性估计,对正确断言产生高置信度,以及在不正确的断言处分配低置信水平的估计数,(2)惩罚导致更高百分比的无关正确断言百分比的不确定性措施。我们进一步基准测试由14个独立参与的Qu-Brats 2020的分割不确定性,所有这些都参与了主要的Brats细分任务。总体而言,我们的研究结果证实了不确定性估计提供了分割算法的重要性和互补价值,因此突出了医学图像分析中不确定性量化的需求。我们的评估代码在HTTPS://github.com/ragmeh11/qu-brats公开提供。
translated by 谷歌翻译